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1.  Sumaary. 
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4 dr  samples .  from  two  continuous  populations,  the  first  with 

/—  V -f. .  .  -  r 


v  i < * - v v$l  y  ~  v  ^ 

unique  100$  pointV^,  the  second  with  unique  100|S  point  ^L.  The  two  populations  are 


l-  -  ^  i(,N  7  [•  yC”"r^M_/ 

$  pointV^, 

*  ^  y 

not  necessarily  the  same  or  even  related.  This  paper  presents  same  easily  applied  sijni- 


7. 


>  /.  ’ 

ficance  tests  fdr  ft-  -  0^  which  are  approximately  valid  for  moderate  and  large  sized 

a  p 

samples.  The  exact  significance  level  of  a  teat  i3  not  known  but  its  value  is  determined 
within  reasonably  close  limits.  Efficiency  properties  of  these  tests  are  investigated 
for  the  special  case  of  normal  populations  with  known  ratio  of  variances.  The  tests  are 
found  to  be  reasonably  efficient  iiV®  and  0*  are  not  too  large  or  too  small.  Since  these 
tests  are  often  valid  for  moderate  as  well  as  large  sized  samples,  they  may  be  of 
practical  value. 

2.  Introduction  and  dfcsfcrlptlve  outline.  A  problem  of  occasional  practical  interest 


cf^slrl 


is  that  of  comparing  a  specified  percentage  point  9q  of  one  arbitrary  population  with  a 
specified  percentage  point  r[  another  arbitrary  population.  For  example,  it  might  be 
desired  to  test  whether  the  10)(  point  of  the  first  population  exceeds  the  lb%  point  of 
the  second  population  by  more  than  11  units.  As  another  example,  one  might  wish  to  test 
whether  the  93^  point  of  the  first  population  is  the  same  as  the  point  obtained  by  sub¬ 
tracting  5  units  from  the  2kt  point  of  the  second  population.  Since  little  is  known 
concerning  the  distribution  functions  of  the  populations,  however,  most  methods  developed 
for  testing  0q  -  require  very  large  samples.  The  purpose  of  this  paper  is  to  present 

some  tests  of  9  -  0  which  are  applicable  to  moderate  sized  samples  for  many  situations 
Q  P 

of  practical  interest. 

Before  outlining  the  chain  of  reasoning  used  to  obtain  the  results  presented  in  this 

paper,  let  us  consider  a  large  sample  method  of  obtaining  tests  for  6  -  0_.  Both  con- 

a  p 

tinuous  populations  are  assumed  to  have  probability  density  functions.  Let  the  first 
population  have  a  density  function  f(x)  while  the  second  population  has  a  density  function 


g(y  j .  These  two  functions  are  arbitrary  except  that  f(®a)  /  0,  g(0p)  /  0,  and  f * (©q) , 
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S' (0a)  exist  and  are  continuous  in  the  vicinity  of  the  specified  points.  Let  x(l), 

P 

x(m)  represent  the  values,  arranged  in  increasing  order  of  magnitude,  of  a  sample  of 
size  a  from  the  population  with  density  f(x)  while  y(l),  •••,  y(n)  denote  the  values, 
arranged  in  increasing  order  of  magnitude,  of  a  sample  of  size  n  from  the  population 
with  density  g(y) .  Then  asymptotically  (■,  n  — ►  oo)  the  distribution  of 

(1)  [x(an)  -  y(Pn)  -  -  0p)]/V«U  “  a)/mf(6a)2  ♦  3(1  -  P)/ng(0p)2 

is  standard  normal  (i.e.,  zero  mean,  unit  variance).  Here  am  and  3n  are  integers.  This 
is  a  direct  application  of  a  modification  of  the  results  of  [l,  p.  369].  For  many  situa¬ 
tions  of  practical  interest,  the  distribution  of  (1)  is  nearly  standard  normal  for  values 

of  a  and  n  which  are  not  extremely  large  (see,  e.g.,  [2]).  Thus  if  f(0  )  and  g(0  )  were 

a  p 

known,  (1)  could  be  used  to  test  6q  -  0^  for  practical  situations  involving  mediumly 

large  samples.  Although  f(0  )  and  g(0.)  are  not  known,  (1)  can  be  modified  to  yield 

a  3 

large  sample  tests  for  9  -  0fl.  For  example,  f(0  )  and  g(0  )  can  be  replaced  by  estimates 

a  p  a  p 

based  on  the  sample  values  which  converge  in  probability  to  f(&a)  and  g(0^)  as  m,  n  ->  oo . 

Then  asymptotically  the  distribution  of  the  resulting  modification  of  (1)  is  standard 

normal.  This  follows  from  combining  (1)  with  the  convergence  theorem  [l,  p.  254].  A 

refinement  of  this  method  using  the  results  of  [3]  could  also  be  applied.  Even  if  f(0  ) 

a 

and  g(0p)  were  known,  however,  use  of  (1)  would  not  necessarily  yield  sufficiently 
accurate  results  for  moderate  values  of  m  and  n. 

This  paper  develops  teste  which  appear  to  be  reasonably  accurate  for  moderate  as  well 
as  large  values  of  m  and  n  if  the  populations  are  of  the  type  approximated  in  practical 
situations .  These  tests  are  based  on  statistics  of  the  form 

(2)  x(am  ♦  C1  V»)  -  j(Pn  ♦  C2  \fn  )  . 

By  appropriate  choice  of  and  C^,  the  values  of  quantities  of  the  form 

(3)  Pr [x( am  ♦  C1V»)  -  7(Pn  ♦  C2Vn)  -  (0Q  -  00)  <  o] 

can  be  made  to  be  within  fairly  small  intervals  for  moderate  m  and  n,  where  the  values 
within  an  interval  are  all  of  a  magnitude  suitable  for  significance  levels.  Similarly 
for  quantities  of  the  form 
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?r[x(om  ♦  C^V*)  -  y(0n  ♦  C2Vn)  -  (0q  -  0p)  >  o]  . 

Thus  (2)  can  be  used  to  compare  8  -  0  with  a  given  hypothetical  value  u  .  For  example, 

a  p  o 

Q  -  0a  <  p  can  be  investigated  by  one-eided  testa  of  the  font 
a  p  o 

(4)  Accept  $a  -  <  pQ  if  x(aa  ♦  C^V*)  -  y(0n  ♦  C.^^/n)  <  pQ  . 

From  (3),  the  significance  level  of  this  test  can  be  fixed  within  reasonably  close  limits 

(for  moderate  a  and  n)  by  suitable  choice  of  and  C^. 

Let  us  consider  an  outline  of  the  method  used  to  derive  the  tests.  For  this  purpose 

it  is  sufficient  to  limit  consideration  to  one-sided  tests  of  the  fora  (4).  The  first 

step  of  the  derivation  consists  in  determining  the  asymptotic  distribution  of  (2)  under 

the  assumption  that  f(6  )  and  g(0o)  are  known.  The  result  obtained  is  that  the  asymptotic 

a  p 

distribution  of  a  certain  function  of  (2),  8q  -  0^,  f(0Q),  g(0^),  and  is  standard 

normal.  To  emphasize  the  dependence  on  f(8  )  and  g(0_),  let  this  function  by  denoted  by 

a  p 

z[f(ea)(  *(0p)]. 

Since  f(©Q)  and  g(0^)  axe  not  known,  it  would  be  convenient  if  and  could  be 
chosen  so  that  the  value  of  (3)  Is  independent  of  the  true  values  of  these  quantities 
(asymptotically).  This  "Studentization"  can  be  accomplished  to  a  reasonable  approxima¬ 
tion.  Let  the  interval  ( y ,  8)  include  the  set  of  possible  values  of  (3)  when  f(0  )  and 

a 

g(0„)  are  replaced  by  arbitrary  positive  numbers.  By  suitable  choice  of  C,  and  C-,,  y  and 
P  -t  < 

8  can  be  made  to  lie  fairly  close  together  and  have  values  suitable  for  significance 
levels  (asymptotically) . 

Now  consider  the  case  where  m  and  n  are  not  large.  Let  the  function  values  f(©fl) 

and  g(0  )  be  replaced  by  the  arbitrary  positive  parameters  A  and  B  in  the  function 
P 

Z[f(ea),  g(0p)].  By  appropriate  selection  of  the  values  of  A  and  B,  the  cumulative 
distribution  function  (cdf)  of  z[A,  b]  is  fitted  to  the  standard  normal  cdf  in  the  small 
interval  y  to  S.  There  are  intuitive  reasons  for  believing  that  a  reasonably  accurate  ^ 

fit  can  be  obtained  in  this  small  interval  for  moderate  m  and  n  if  the  two  populations 
are  of  the  types  approximated  in  practice.  Thus  for  practical  situations  the  significance 
level  of  (4)  should  lie  within  or  near  the  interval  y  to  8  for  noderate  values  of  m  and 
n. 
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The  basic  trick  in  the  derivations  lies  in  selecting  and  C2  that  replacing 

f(0  )  and  g(0a)  by  numbers  which  may  vary  greatly  from  the  true  values  of  these  quantities 
«  P 

does  not  cause  much  variation  in  the  value  of  (3).  This  allows  the  fitting  of  the  cdf's 
in  the  small  interval  to  be  performed  while  keeping  the  possible  significance  levels  in 
a  fairly  narrow  range.  Since  the  interval  where  the  fitting  takes  place  is  small  and  a 
wide  latitude  in  the  choice  of  the  parameters  is  available ,  it  seems  reasonable  to  believe 
that  an  accurate  fit  can  be  obtained  for  moderate  m  and  n. 

The  tests  advocated  by  this  paper  are  stated  in  section  3.  This  section  also 
presents  some  *rmle  of  thumb*  conditions  for  deciding  when  a  and  n  are  sufficiently  large 
for  the  tests  to  be  applicable. 

A  detailed  derivation  of  the  properties  of  the  tests  presented  in  section  3  is  con¬ 
tained  in  section  4. 

To  obtain  an  approximate  lower  bound  for  the  efficiency  of  the  tests  for  0  -  0a,  the 

a  p 

case  where  both  populations  are  normal  and  the  ratio  of  variances  is  known  was  analyzed 
for  large  m  and  n.  The  resulting  efficiencies  should  be  much  lower  than  those  ordinarily 
encountered  because  of  the  additional  information  assumed  and  because  the  efficiency  of 
non-parametric  results  usually  decreases  as  the  sample  size  increases.  It  is  found  that 
the  efficiency  of  the  tests  is  reasonably  high  if  .1  <  a,  P  <  .9.  Further  results  and 
derivations  are  given  in  section  5. 

To  obtain  a  rough  quantitative  idea  of  how  large  m  and  n  need  to  be  for  sufficiently 
accurate  tests,  the  special  case  where  a  ■  p  and  f(x)  »  g(x)  is  considered.  Then  the 
significance  level  of  a  test  is  exactly  determined  and  can  be  obtained  without  a  substan¬ 
tial  amount  of  computation.  Section  6  contains  an  analysis  for  this  case  in  which  the 
exact  results  are  checked  against  the  range  y  to  5.  It  is  found  that  the  true  signifi¬ 
cance  level  lies  within  or  near  the  range  y  to  8  even  for  fairly  small  values  of  ■  and  n. 

The  procedures  used  in  this  paper  are  based  on  some  methods  originally  presented  by 
one  of  the  authors  In  [4]. 

3.  Statement  of  tests.  This  section  contains  explicit  specification  of  the  tests 
discussed  in  the  preceding  sections. 
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First  let  us  consider  one-sided  tests  of  0  -  0  v  u  •  Let  6  be  the  approximate 

a  3  o 

significance  level  desired.  Then  the  test  advocated  is 

(5)  Accept  9q  -  0^  <  *iQ  if  x(om  ♦  .85^  \fa{l  -  a)m  )  -  y(3n  -  .85^  V3(l  -  JJn  )  <  , 

where  is  the  standardized  normal  deviate  exceedei  with  probability  <£  .  Here 

am  ♦  ,85K^  \A*(1  -  a)®  and  £n  -  .851^^3(1  ~  p)n  should  be  integers  or  nearly  equal  to 
integers.  If  am  ♦  ,85Kf  V^a(l  “  a)m  1®  not  ^  integer,  x(am  ♦  ,85K{  \^TT  -  a) m  )  has  the 
interpretation  x(integer  nearest  to  an  ♦  ,85K^  v^(l  -  a)m) .  Similarly  for 
y(3n  -  .85K^  e/p(l  -  3)n ) .  An  approximate  lower  bound  for  the  si gnif i cance  level  of  (5) 
is  y,  where  y  is  defined  by  the  relation  ■  1.25K^.  An  approximate  upper  bound  is  S, 
where  Kg  -  .83K£  .  For  example,  let  8  *  .03;  then  y  -  .020  and  S  ■  .086.  As  another 
example,  let  6  ■  .01;  then  y  •  .0023  and  $  ■  .027.  For  most  situations  of  practical 
interest,  it  appears  likely  that  the  true  value  of  the  significance  level  of  (5)  will 
be  much  nearer  y  than 

Next  consider  one-sided  tests  of  8  -  0  >  H  .  If  £  is  the  approximate  sigrd.fi- 

G  p  O 

cance  level  desired,  the  test  advocated  is 

(6)  Accept  0Q  -  0^  >  if  x(aff.  -  .85^  ^a(l  -  a)m  )  -  y(  pn  ♦  ,85K^  */3(l  -  p)n  )  >  . 

Here  x(am  -  .85^  Va(l  ”  a)®)  and  y(3n  ♦  ,85^  />/3(l  -  3)n  )  have  interpretations  of  the 
type  stated  for  (3)  and  am  -  V°(l  -  a)m  ,  3n  ♦  a/3(1  -  3)n  should  be  integers  or 

nearly  equal  to  integers.  As  for  test  (5),  an  approximate  lower  bound  for  the  signifi¬ 
cance  level  of  (6)  is  y  while  $  is  an  approximate  upper  bound. 

Two-si  led  tests  of  8^  -  0^  /  can  be  obtained  by  combining  (5)  and  (6).  For 
example,  a  two-siied  test  with  desired  significance  level  approximately  28  and  nearly 
equal  tails  is  given  by 

Accept  ed  -  0^  /  0O  LL  »lthg 

x(am  ♦  .85K^  */a  ri  -  a)m)  -  y(3n  -  .851^  V3U  -  J)n  )  <  uo 
or 

x(am  -  ,85Kc  */a(l  -  a)m)  -  y(3n  ♦  ,35K^  V3(l  -  JJn  )  >  ^  . 

An  approximate  lower  bound  for  the  significance  level  of  this  test  is  2y;  an  approximate 
upper  bound  is  -.5. 


y 
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Comparison  of  (5)  with  (4)  shows  that  for  the  special  case  (5),  -  .351^  \/a(l  -  a)  , 

C2  •  -.85K£/Vr0(l  -  flT  •  Thus  the  values  of  and  C2  chosen  for  test  (5)  are  not  large 
in  magnitude.  Similar  considerations  hold  for  test  (6). 

The  decision  as  to  when  a  and  n  are  large  enough  for  these  tests  to  be  applicable 
is  a  difficult  one.  Section  6  contains  an  investigation  of  this  problem  for  a  special 
case.  But  there  is  no  reason  to  believe  that  the  results  for  this  special  case  hold  in 
most  situations  of  practical  interest.  In  general,  however,  it  seems  reasonable  to 
believe  that  the  accuracy  of  the  teats  decreases  as  a  and  0  deviate  from  For  example, 
a  test  with  a  •  .4,  0  •  .55  will  likely  be  sufficiently  accurate  for  much  smaller  values 
of  m  and  n  than  a  test  with  a  -  .005,  0  •  .998.  For  the  sake  of  definiteness,  a  rule 
for  deciding  when  m  and  n  are  sufficiently  large  will  be  given.  It  is  hoped  that  this 
rule  will  be  conservative  for  most  practical  situations.  In  particular,  this  rule 
appears  to  be  conservative  for  the  special  case  of  section  6.  It  should  be  emphasized, 
however,  that  the  rule  pre**nted  is  no  more  than  a  conjecture  and  has  no  strong  theoret¬ 
ical  or  empirical  basis.  The  rule  is 

Accept  that  m  and  n  are  sufficiently  large  if  ain[am,  (1-  a)m,  0n,  (1-  0)n]  >  5. 

In  use  of  thie  rule,  the  value  of  €  should  not  be  too  small;  say,  6  >  .005  . 

4.  Derivations.  This  section  contains  proof  of  the  results  and  properties  stated 
in  the  preceding  sections. 

First,  let  us  consider  the  asymptotic  distribution  of  (2).  The  theorem  on  which 
this  is  based  was  presented  in  [5]  in  a  form  slightly  different  from  that  used  in  this 
paper.  For  completeness  and  convenience  of  reference,  the  version  used  here  will  be 
stated  and  an  outline  of  the  proof  presented. 

Theo£®|.  Let  t(l),  *(r)  denote  the  values  of  a  sample  of  size  r  (arranged  in 

increasing  order  of  magnitude)  from  a  population  with  probability  density  function  h(z). 

The  function  h(a)  has  the  properties  that  h(FJp)  t  0  and  that  h * ( z )  exists  and  is  con¬ 
tinuous  in  some  neighborhood  of  ^  ,  where  ^  Is  the  100p£  point  of  the  population. 


Then  the  quantity 


Vr/p(l  -  p)  h(^p)[z(pr  ♦  C*/r)  -  £p 
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has  a  distribution  which  approaches  the  nomal  distribution  with  mean  C/^/pd  -  p)  and 
unit  variance  as  r  — >  oo .  Here  pr  ♦  C  ^/i*  i*  restricted  to  be  an  Integer. 

Proof.  The  method  used  to  prove  this  theorem  is  analogous  to  a  modification  of  that 

of  1,  pp.  368-69  |  if  r  Is  used  instead  oA  n  and  pr  is  replaced  by  pr  ♦  C  */r  . 

The  asymptotic  distrioution  of  (2)  is  obtained  by  applying  this  theorme  to  each  of 

the  two  samples  and  then  considering  x(a®  ♦  )  -  y(p  n  +  C2Vn)  •  Explicitly,  it  is 

found  that  the  asymptotic  distribution  of  the  quantity 

x(om  ♦  C,  Ym  )  -  y(0n  ♦  C~  Vn)  -  (9  -  0  ) 

(7)  - 1 - L - .1.  P 

Vo(l  -  aj/mfte^)2  *  0(1  -  e)/n«(0e)2 

is  normal  with  unit  variance  and  mean  M  equal  to 

[C1/V°  -  C2f(©a)/Vn  g(0p)]  /a/o(1  -  a)/*  ♦  0(1  -  0)f(eQ)2 ;ng(0p)2  . 

Now  consider  the  choice  of  and  C2  *°  that  M  is  almost  constant  as  a  function  of 
f(©a)  and  g(0p).  To  do  this,  the  value  of  C-,/C0  is  restricted  so  that  M  has  the  same 
value  for  f(9a)/g(0^)  -  0  as  for  f(8a)/g(0^)  -  ao  .  This  requires  that 

(8)  C1/C2  -  -V«(l  -  a)/p(l  -  p)  . 

Using  this  relation  and  solving  for  the  maximum  and  minimum  values  of  M  as  a  function  of 

f(0  )  and  g(0a),  it  is  found  that 
a  P 

CjA/aU  -  u,  <  M  <  \/2  Cj/Vafl  -  a)  (Cj_  >  0) 

(9)  _ _ _ 

V2  Cj/YaU  -  a)  <  M  <  Cj/Vad  -  a)  <  0). 

If  f(0  )/g(0„)  is  either  very  small  or  very  large,  the  value  of  M  is  near  C./Va(l  -  a)  . 

Q  p  X 

If  f(0Q)/g(0p)  is  in  the  vicinity  of  1,  the  value  of  M  is  near  f/2  C^/V°C1  -  a)  . 
Examination  of  (9)  shows  that  the  range  of  possible  variation  for  M  is  not  great  if  (8) 
is  satisfied. 

Asymptotically,  the  value  of  M  determines  the  significance  level  of  a  test.  For 
example,  let  us  consider  a  teat  of  the  form  (4).  Asymptotically,  the  significance  level 
of  this  test  is  determined  by 
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Pr[x(am  ♦  C^yin)  -  y(0n  *  C^Vn)  -  (0q  -  0^)  <  o] 

(10) 

-  Pr[(7)  -  M  <  -If]  -  —m  P  ,’z2/2  dz, 

\/2n  J  -oo 

since  the  asymptotic  distribution  of  (7)  -  M  is  standard  normal.  Thus  choice  of  and 
C 2  so  that  the  value  of  M  does  not  vary  much  implies  that  asymptotically  the  significance 
level  of  a  test  is  fairly  closely  determined  without  any  knowledge  of  the  values  of 
f(0Q)  and  g(0p). 

Comparison  with  (1)  indicates  that  the  distribution  of  (7)  -  M  is  nearly  standard 
normal  for  values  of  m  and  n  which  are  not  extremely  large,  particularly  if  and 
are  not  large  in  magnitude  (as  is  the  case  for  the  tests  of  section  3).  Thus  it  is 
apparent  that  the  tests  of  section  3  are  usually  valid  for  swdiumly  large  values  of  a 
and  n.  The  following  intuitive  considerations  indicate  that  these  tests  are  applicable 
even  if  a  and  n  are  not  very  large. 

In  (7)  and  H,  let  f(0Q)  and  g(0p)  be  replaced  by  the  arbitrary  positive  parameters 
A  and  B,  respective!)'.  Denote  the  resulting  functions  by  (?’)  and  M',  For  any  test, 
the  problem  is  to  fit  the  cdf  of  (7‘)  -  M!  to  a  certain  small  part  of  ths  cdf  of  the 
standard  normal  distribution  by  appropriate  choice  of  A  and  B.  Since  the  part  to  be 

i 

fitted  is  small  and  the  parameters  can  assume  a  wide  range  of  values,  it  seesw  plausible 
that  this  fit  can  be  made  rather  accurate  even  for  aodorately  small  values  of  a  and  n. 

For  one-sided  testa,  the  interval  where  the  two  cdf* a  are  fitted  is  from  the 
minimum  possible  value  of  M'  to  the  maximum  possible  value  of  M'  (as  a  function  of  A 
and  B).  The  limiting  values  for  M'  are  given  by  (9)  since  M  and  M’  obviously  have  the 
same  range  of  possible  values.  If  an  accurate  fit  can  be  obtained  in  this  interval, 
the  significance  level  range  for  moderate  m  and  n  is  approximately  the  same  as  the  range 
for  the  asymptotic  case.  As  an  illustration,  consider  a  test  of  the  form  (4).  The 
significance  level  of  this  test  is  approximately  determined  by  the  relations 

Pr[x(am  ♦  CjVm)-  y(0n  ♦  C2Vn)-  (0q  -  0p)  <  0] 

-  Pr[(7' )  -  M*  <  — 1€ •]  -  j  e“  /2  dz, 

y2%  'J  -oo 


(ID 
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since  the  cdf  of  (7')  -  M'  is  nearly  standard  normal  in  the  range  of  possible  values  for 
M’.  However,  the  range  of  posaiole  valuet  for  M*  equals  the  range  of  possible  values 
for  M.  The  approximate  equality  of  significance  level  ranges  follows  from  (10). 

Now  let  us  consider  the  determination  of  the  values  of  and  used  for  the  tests 
of  section  3.  Test  (5)  was  obtained  by  choosing  a  representative  value  of  M'  in  the 
interval  specified  by  (9)  for  >  0.  Then  was  determined  so  that  the  significance 
level  of  test  (4)  would  be  (  if  M'  actually  had  this  value  and  the  cdf  of  (?')  -  K' 
was  standard  normal.  The  value  of  C21  of  course,  was  found  from  (fi).  Using  the  value 
of  C  ,  the  relation  (11),  and  the  bounds  for  M*,  the  values  of  y  and  S  could  be 
determined  from  the  relatione 

Ky#-1.175K6,  Kg  -  ,833*€  .  -  -  -  — 

However,  to  allow  for  the  fact  that  the  fit  to  the  standard  normal  cdf  in  the  specifieu 
section  is  only  approximate  ana  that  X/B  is  ordinarily  not  large  or  near  zero,  so  that 
M'  i3  usually  near  f\j2  \/aTT""^""a7  ,  the  relations  used  were 

-  1.25Ke  ,  Kg  -  ,83KC  . 

Similar  considerations  apply  for  teat  (6). 

5.  Efficiency  Investigation.  Only  asymptotic  situations  will  be  considered  (i.e., 
both  m,  n->oo).  It  is  desired  to  determine  how  teats  based  on  (5)  and  (6)  compare  with 
the  corresponding  tests  based  on  the  non-central  t-statistic  for  the  case  where  both 

populations  are  normal  and  the  ratio  of  variances  is  known. 

2  2 

Let  0  be  the  variance  of  the  population  with  density  f(x)  while  a  is  the  variance 
*  J 

of  the  population  with  density  g(y) .  Asymptotically,  use  of  the  non-central  t-statistic 
for  testing  8q  -  0^  is  equivalent  to  use  of  the  quantity 

(U)  [x-J-  Vx  *  V,  '  (9o  •  ty]  / \/(1  *  Ka/2)ox/“  *  (1  *  . 


X 


m 

Z 

1 


*(!)/■, 


*  •  Z  y(j)/n 

1 


I 


sy  •  1  y(J)  -  rJ  (n  -  1)  . 


where 
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The  asymptotic  distribution  of  (12)  is  standard  normal. 

Since  the  asymptotic  case  is  being  considered,  (7)  -  M  is  the  quantity  of  interest 
for  tests  based  on  (5)  and  (6).  Comparison  of  (7)  -  M  and  (12)  shows  that  if  the  sample 
sizes  for  (12)  were  decreased  in  the  ratio 

(13)  [i  ♦  K^/2  ♦  <f)(l  ♦  Kjj/2)  I 2%  ja(l  -  a)exp(K^)  ♦  $3(1  -  3)«P(Kp)  , 

where  (£  ■  ma^/nc£,  then  the  asymptotic  distributions  of  (7)  -  M  and  (12)  would  be  the 

same  and  both  (?)  -  M  and  (12)  would  have  the  same  denominator.  Thus  confidence  intervals 

for  0  -  0O  obtained  on  the  basis  of  (7)  -  M  are  asymptotically  equivalent  to  the  corre- 

a  p 

sponding  confidence  intervals  obtained  on  the  basis  of  (12)  with  m  and  n  decreased  in  the 
ratio  (13).  This  property  also  applies  to  the  significance  tests  based  on  these  confi¬ 
dence  intervals.  The  ratio  (13)  will  be  called  the  asymptotic  efficiency  of  the  tests 

based  on  (5)  and  (6).  Actually,  (13)  is  merely  the  variance  of  x  -  y  -  KQ8X  * 

divided  by  the  variance  of  x(o«  ♦  C^V*)  -  y(p n  ♦  C-^Vn)  f°r  large  m  and  n. 

The  value  cf  can  lie  anywhere  between  0  and  ao  .  Consequently  the  asymptotic 
efficiency  of  the  tests  based  on  (5)  and  (6)  can  be  anywhere  between  the  value  of 

(1  ♦  K^/2)/2*a(l  -  a)exp(l^) 

and  the  value  of 

(1  *  K p/2)/  2*0 (1  -  0)exp(K^). 

Table  1  contains  values  of 

(14)  (1  ♦  Kp/2)/  2np(l  -  p)exp(Kp 

for  p  •  .01,  .02,  .05,  .10,  .20,  .30,  .40,  .50.  The  value  of  (14)  is  the  same  for 

p  -  1  -  v  as  for  p  •  v.  Examination  of  Table  1  shows  that  the  amount  of  " information" 

lost  by  using  tests  based  on  (5)  and  (6)  rather  than  the  non-oentral  t-statistic  is  not 
too  great  if  a  and  3  are  not  near  0  or  1. 

6.  Investigation  of  special  case.  In  section  3  a  rule  was  presented  for  deciding 
when  m  and  n  are  sufficiently  large  for  the  tests  of  section  3  to  be  applicable.  The 
purpose  of  this  section  la  to  check  the  accuracy  of  this  mle  for  the  special  case  where 
f(x)  5  g(x)  and  a  -  0  (then  ©a  -  0  ). 

First  let  us  consider  some  implications  of  the  rule  stated  in  section  3.  An  obvious 
result  is  that  a  >  10,  r  >  10.  Since  .005  <  £  <  .5,  it  follows  that  0  <  <  2,58. 

Combining  these  propertiee  it  is  seen  that  the  order  statistics  x(l),  x(m),  y(l),  y(n) 
are  never  used  for  a  test  when  the  rule  is  applied. 

Let  m  >  10,  n  >  10,  n  <  o  (no  loss  of  generality),  and  consider  the  value  of 
Pr  x(u)  -  yvv)  <.  0]  for  the  special  case.  Here  u  and  v  are  integers  such  that  2  <  u  <  m-1 
and  2  <  v  <  n-1.  From  [6j,  the  value  of  Pr[x(u)  -  y(v)  <  0J  equals 
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The  first  step  of  the  empirical  analysis  consists  in  selecting  suitable  values  for  m,  n, 
a,  £.  Corresponding  values  of  u  and  v  are  then  determined  according  to  the  requirements 
for  test  (5);  i.e.,  u  is  the  integer  whose  value  is  nearest  to  am  ♦  .85K^  ^/a(l  -  a)m  ,  v 
is  the  integer  whose  value  is  nearest  to  an  -  .85^  VaTl-aTn  ,  and  both 
am  ♦  ,85K^  */aJT~' end  an  -  -  a  «re  nearly  equal  to  integers.  Then  the 

value  of  (15)  is  computed  and  compared  with  the  corresponding  values  of  y  end  5.  This 
is  done  for  m  >  n  -  10,  15,  20,  25  ana  various  values  of  a  and  6  .  The  results  of  these 
computations  are  contained  in  Table  2. 

Examination  of  Table  2  shows  that  the  value  of  (15)  is  usually  near  y.  This  is  to 
be  expected  for  the  special  situation  considered.  The  derivation  of  y  and  5  shows  that 
8  is  approached  when  A/B  is  near  0  or  CD  while  y  is  approximated  if  A/B  is  in  the  vicinity 
of  1,  Since  the  populations  are  the  same  for  the  case  considered,  the  value  of  A/B  should 
not  differ  greatly  from  unity.  Thus  the  values  obtained  for  (15)  can  be  considered  very 
good  approximations  to  those  expected  for  situations  of  this  type.  Consequently  the  rule 
of  section  3  would  seem  to  be  adequate  for  this  special  case  (i.e.,  f  s  g,  a  -  3). 
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TABLE  1 


7alu«3  of  (1  ♦  K^/2)  ( 1  -  p)exp(K^) 


1 - 

p 

.01 

.02 

.05 

.10 

.20 

.30 

•  4O 

.50 

(14) 

.265 

.  **70 

.5-6 

.6-5 

.66- 

.654 

.6*5 

.617 

